Union Join
The Union node is used to create a union between 2 or more tables, adding the union to the database schema. The Union node can be connected to any Select (excluding Multi Select), Preparation, and Column Operation nodes. The Union function combines rows (vertically) from multiple tables, appending the rows in the second table to the first. Each table should have the same number of columns in it.
The Union node can be used to combine rows from tables from the datasource, or to combine datasource tables with new tables generated by data cleansing and preparation functions. For example, both the Date Range and Summarize functions generate new tables; these can then be joined to the original table from the datasource.
A typical use case for a union is if you have two (or more) lists, consisting of the same column structure but different values, and you want to combine those lists into one. For instance, you may have two lists of sales people and number of items they sold, from two different stores. You can combine the rows from both tables using the Union function to produce one master list.
Create a Union or Stack Join (vertical)
Connect the Union node to the tables to be combined in the order in which they should be appended to the new table; the rows of each table are appended in the order in which the tables are connected to the Union node. The given tables should have the same number of columns, and the same data type.
With the Union node selected, go to the Properties panel to configure the union:
Resulting Table Name: name the new table.
Union All: disable this option to eliminate duplicate rows.
Add Source Table Name Column: add a column that lists the name of the source column for each row.
In this example, we have 3 Excel files from 3 different stores: 'Store A', 'Store B', and 'Store C'. Each spreadsheet contains a table called 'Transactions', which has the following columns: Sales Person ID, Number of Transactions, Items of Items Sold. Note that the Sales Person IDs '5' and '9' appear in 2 lists (for 'Store A' and 'Store C'):
After importing our spreadsheets into Pyramid, we want to create a new table that will combine all rows in each of the 3 Transactions tables into one table, along with a new column listing the original source table name for each row. This new table should look like this:
To achieve this, we need to connect each of the 3 tables to a Union node in the Data Flow.
From the Properties panel (green arrow below), Union All is selected because we know that there are duplicates in the Sales Person ID column, where sales people worked in multiple stores. 'Add Source Table Name Column' is enabled so that we can see which store each sales person worked in.
The resulting table seen in the Preview panel (blue highlight) combines all rows in each of the three lists: